import pandas as pd
import numpy as np
import plotly.graph_objects as go
from matplotlib import pyplot as plt
For D from 1 to 15 dimensions, simulate 1000 random D-dimensional points, where the value in each dimension is uniformly randomly distributed between -1 and +1.
dimension_count = 15
sample_size = 1000
list_of_D_dim_arrays = []
for D in range(1,dimension_count+1):
list_of_D_dim_arrays.append(np.random.uniform(-1,1,sample_size*D).reshape(sample_size,D))
Calculate the fraction of these points that are within distance 1 of the origin, giving an approximation of the volume of the unit hypersphere to the hypercube inscribing it. Plot this fraction as a function of D (a scatter plot of D versus the fraction).
x_list = []
y_list = []
for D in range(0,dimension_count):
number_in_unit_hypersphere = 0 #set counter to zero for every dimension
for point in range(0,sample_size):
# increase counter by 1 if euclidian distance between the point and origin is less than or equal to 1
number_in_unit_hypersphere = number_in_unit_hypersphere + int((np.square(list_of_D_dim_arrays[D][point]).sum())**0.5<=1)
x_list.append(D+1)
y_list.append(number_in_unit_hypersphere/sample_size)
#Plot this fraction as a function of D
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_list, y=y_list))
fig.update_layout(title='D versus the fraction',
xaxis_title='D',
yaxis_title='fraction')
fig.show()
Use the value of this fraction at D = 2 and D = 3 to get estimates for the value of pi (Π) as you know the area (for D = 2) and volume (for D = 3) formulae for these cases.
Area of unit square = (2r)^2 = 4
Area of unit circle = pi*r^2 = pi
fraction = pi/4 --> pi = 4*fraction
pi_estimate_D_2 = 4*y_list[1]
print('Estimate of pi for D = 2: ',pi_estimate_D_2)
Estimate of pi for D = 2: 3.212
Volume of unit cube = (2r)^3 = 8
Volume of unit sphere = 4/3pir^3 = 4/3*pi
fraction = 4/3pi/8 --> pi = 6fraction
pi_estimate_D_3 = 6*y_list[2]
print('Estimate of pi for D = 3: ',pi_estimate_D_3)
Estimate of pi for D = 3: 3.096
Perform the calculations in part (b) with larger sample sizes. You can use the following set: {5000, 10000, 25000, 50000, 100000}. Visualize the estimated Π for D = 2 and D = 3 cases. Comment on your results
def estimate_pi(D,sample_size):
number_in_unit_hypersphere = 0 #set counter to zero
array_of_points = np.random.uniform(-1,1,sample_size*D).reshape(sample_size,D)
for point in range(0,sample_size):
# increase counter by 1 if euclidian distance between the point and origin is less than or equal to 1
number_in_unit_hypersphere = number_in_unit_hypersphere + int((np.square(array_of_points[point]).sum())**0.5<=1)
if D == 2:
return 4*(number_in_unit_hypersphere/sample_size)
elif D == 3:
return 6*(number_in_unit_hypersphere/sample_size)
sample_size_list = [1000,5000, 10000, 25000, 50000, 100000]
D_2_p_estimates = []
D_3_p_estimates = []
for s in sample_size_list:
D_2_p_estimates.append(estimate_pi(2,s))
D_3_p_estimates.append(estimate_pi(3,s))
#Visualize the estimated Π for D = 2 and D = 3 cases
fig = go.Figure()
fig.add_trace(go.Scatter(x=sample_size_list, y=D_2_p_estimates,name='D=2'))
fig.add_trace(go.Scatter(x=sample_size_list, y=D_3_p_estimates,name='D=3'))
fig.update_layout(title='Sample size versus estimate of pi',
xaxis_title='Sample size',
yaxis_title='Estimate of pi')
fig.show()
As sample size increases, I expect that the estimate gets closer to the true value of pi. That is actually the case for D = 2 and D = 3.
We got better result with D = 2 which shows that increasing the dimension does not always work.
Although the results come out as I expected, remember this is the result of just one run. Running for more iterations and taking the average of estimates would be better.
Repeat this simulation, sampling 1000 D-dimensional points from 1 to 15 dimensions, where the value in each dimension is uniformly randomly distributed between -1 and +1. For each value of D, generate an additional 100 test instances and calculate the distance to each test instance’s nearest neighbor. Plot the average distance from the test instances to their nearest neighbors as a function of D.
list_of_D_dim_arrays = []
list_of_D_dim_arrays_test = []
test_size = 100
for D in range(1,dimension_count+1):
list_of_D_dim_arrays.append(np.random.uniform(-1,1,sample_size*D).reshape(sample_size,D))
list_of_D_dim_arrays_test.append(np.random.uniform(-1,1,test_size*D).reshape(test_size,D))
x_list = []
y_list = []
for D in range(0,dimension_count):#iterate over dimensions to find the avg distance of each
sum_of_distances = 0
for test_point in range(0,test_size): #iterate over test instances to sum nearest neighbour distances
nn_distance = 999999
for point in range(0,sample_size): #iterate over train instances to find nearest neighbour distance
euclidian_distance = (np.square(list_of_D_dim_arrays_test[D][test_point]-list_of_D_dim_arrays[D][point]).sum())**0.5
if euclidian_distance <= nn_distance:
nn_distance = euclidian_distance
sum_of_distances = sum_of_distances + nn_distance
avg_of_distances = sum_of_distances/test_size
x_list.append(D+1)
y_list.append(avg_of_distances)
#Plot the average distance from the test instances to their nearest neighbors as a function of D.
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_list, y=y_list))
fig.update_layout(title='D versus the average distance from the test instances to their nearest neighbors',
xaxis_title='D',
yaxis_title='average distance')
fig.show()
The average distance from the test instances to their nearest neighbors increases as dimension increases
Read image as a variable in R/Python. You need to install “jpeg” package to read image into a variable if you use R. For Python, an alternative is to use matplotlib package. What is the structure of the variable that stores the image? What is the dimension? a. Display the image. (Hint: google “rasterImage”)
image = plt.imread('cigdem_foto.jpg')
print('Structure of the variable that stores the image is ',type(image))
Structure of the variable that stores the image is <class 'numpy.ndarray'>
print('Dimension is ',image.shape)
Dimension is (512, 512, 3)
imgplot = plt.imshow(image)
Display each channel as separate image
imgplot = plt.imshow(image[:,:,0],cmap='Reds')
imgplot = plt.imshow(image[:,:,1],cmap='Greens')
imgplot = plt.imshow(image[:,:,2],cmap='Blues')
For each channel, take the average of the columns and plot the average as a line plot for each channel on a single plot.
red_channel = image[:,:,0]
green_channel = image[:,:,1]
blue_channel = image[:,:,2]
#Plot the average distance from the test instances to their nearest neighbors as a function of D.
x_list = list(range(1,513))
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_list, y=np.mean(red_channel,axis=0),name='red',line_color='red'))
fig.add_trace(go.Scatter(x=x_list, y=np.mean(green_channel,axis=0),name='green',line_color='green'))
fig.add_trace(go.Scatter(x=x_list, y=np.mean(blue_channel,axis=0),name='blue',line_color='blue'))
fig.update_layout(title='average of the columns',
xaxis_title='columns',
yaxis_title='average of columns')
fig.show()
For each channel, subtract one half of the image from the other half (choice of halves is up to you but dividing the head image vertically into two parts make more sense). If you observe negative pixel values, you can make them equal to zero. Then:
red_channel_half_difference = red_channel[:,0:256]-red_channel[:,256:]
green_channel_half_difference = green_channel[:,0:256]-green_channel[:,256:]
blue_channel_half_difference = blue_channel[:,0:256]-blue_channel[:,256:]
print('Number of negative elements in red half: ',red_channel_half_difference[red_channel_half_difference<0].shape[0])
print('Number of negative elements in green half: ',green_channel_half_difference[green_channel_half_difference<0].shape[0])
print('Number of negative elements in blue half: ',blue_channel_half_difference[blue_channel_half_difference<0].shape[0])
Number of negative elements in red half: 0 Number of negative elements in green half: 0 Number of negative elements in blue half: 0
image_half_difference = np.dstack((red_channel_half_difference,green_channel_half_difference,blue_channel_half_difference))
imgplot = plt.imshow(image_half_difference)
imgplot = plt.imshow(red_channel_half_difference,cmap='Reds')
imgplot = plt.imshow(green_channel_half_difference,cmap='Greens')
imgplot = plt.imshow(blue_channel_half_difference,cmap='Blues')
In order to create a noisy image, add a random noise from uniform distribution with minimum value of 0 and a maximum value of “0.1 * maximum pixel value observed” to each pixel value for each channel of original image. • Display the new image. • Display each channel separately as separate image.
noisy_image = image + np.random.uniform(0,image.max()*0.1,512*512*3).reshape(512,512,3).astype(np.uint8)
imgplot = plt.imshow(noisy_image)
imgplot = plt.imshow(noisy_image[:,:,0],cmap='Reds')
imgplot = plt.imshow(noisy_image[:,:,1],cmap='Greens')
imgplot = plt.imshow(noisy_image[:,:,2],cmap='Blues')